Quality control in SARS-CoV-2 RBD-Fc vaccine production using LC–MS to confirm strain selection and detect contaminations from other strains

Coronavirus disease of 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is an ongoing outbreak, disrupting human life worldwide. Vaccine development was prioritized to obtain a biological substance for combating the viral pathogen and lessening disease severity. In vaccine production, biological origin and relevant materials must be carefully examined for potential contaminants in conformity with good manufacturing practice. Due to fast mutation, several SARS-CoV-2 variants and sublineages have been identified. Currently, most of COVID-19 vaccines are developed based on the protein sequence of the Wuhan wild type strain. New vaccines specific for emerging SARS-CoV-2 strains are continuously needed to tackle the incessant evolution of the virus. Therefore, in vaccine development and production, a reliable method to identify the nature of subunit vaccines is required to avoid cross-contamination. In this study, liquid chromatography-mass spectrometry using quadrupole-time of flight along with tryptic digestion was developed for distinguishing protein materials derived from different SARS-CoV-2 strains. After analyzing the recombinantly produced receptor-binding domain (RBD) of the SARS-CoV-2 spike protein, nine characteristic peptides were identified with acceptable limits of detection. They can be used together to distinguish 14 SARS-CoV-2 strains, except Kappa and Epsilon. Plant-produced RBD-Fc protein derived from Omicron strains can be easily distinguished from the others with 4–5 unique peptides. Eventually, a peptide key was developed based on the nine peptides, offering a prompt and precise flowchart to facilitate SARS-CoV-2 strain identification in COVID-19 vaccine manufacturing.

to distinguish Omicron BA1 and BA2 strains from the others.Other peptides (peptide 2, 3, 5, 6, 7, 8 and 9) are unique for certain SARS-CoV-2 strains.Peptide 7 with CVNFNFNGLK sequence is a unique peptide for Omicron BA1.Peptides 3 with the sequences of GNEVSQIAPGQTGNIADYNYK, QIAPGQTGNIADYNYK and QIAPGQTGTIADYNYK are unique for Omicron BA2, Beta and Gamma strains, respectively.Peptides 8 and 9 are unique for Zeta and Alpha strains, respectively.Peptide 5 is unique for Lambda strain with L135Q substitution.It is also applicable for separating Kappa and Epsilon strains from the others according to L135R substitution.However, Kappa and Epsilon cannot be distinguished from each other due to one-amino-acid difference.At position 167, glutamic acid (E) represents Epsilon strain and glutamine (Q) belongs to Kappa strain.This position is located on peptide 6.By nature, this peptide is hardly detected by LC-MS analysis due to high hydrophobicity and lengthy sequence (43 amino acids).Its MS and MS/MS signals were relatively low as compared to the other peptides (Supplementary Fig. S2).Therefore, Kappa and Epsilon strains cannot be differentiated by the developed LC-MS technique.Nonetheless, peptide 6 of certain SARS-CoV-2 strains was cleavable by trypsin due to lysine (K) substitution at T161 or E167 sequence, yielding characteristic peptides for Delta and Iota strains.Cleaved peptide 6 (peptide 6.2) can be used to differentiate Theta and Eta strains from each other.

Application of the peptide key to identify RBD-Fc proteins from different SARS-CoV-2 strains
Examples showing the stepwise identification of recombinant RBD-Fc proteins derived from different SARS-CoV-2 strains according to the nine characteristic peptides are shown in Fig. 4. In the top panel (case 1), Omicron BA2 strain was identified from the sample.Unique peptides 2 and 3 of Omicron BA2 were clearly detected.Peptides 1 and 4, common peptides for Omicron strains, were also found.In addition, peptides 2 and 4 of Wuhan strain were concomitantly detected, indicating that Wuhan strain might be mixed with Omicron BA2 in this sample.In the bottom panel (case 2), peptide 9 unique for Alpha strain and peptide 5 unique for Kappa or Epsilon strains were observed.In contrast, peptides 5 and 9 common for the other strains were not simultaneously detected.Unique peptides of the other strains were neither observed, implying that this sample could be a mixture of Alpha and Kappa strains or Alpha and Epsilon strains.www.nature.com/scientificreports/

Detection limit of potential contaminants in RBD-Fc protein
The detection limit of all nine characteristic peptides is displayed in Table 1.The lowest protein concentrations, where all peptides were detected in MS and MS/MS modes, were 0.5 and 5 ng ml −1 , respectively.By using the formula, limits of detection ranged between 4.58 to 113.37 ng ml −1 .The limit of peptide 6.1 detection was not determined since at least 100 ng ml −1 is required to obtain the signal of this peptide.Peptides 5 and 8 were the most sensitive peptides with the lowest concentrations of 5-10 ng ml −1 detected at MS/MS level, implying that contaminants of RBD-Fc proteins from Lambda, Kappa or Epsilon and Zeta strains can be delicately detected.Detections of Gamma strain with peptide 3 and Iota strain with peptide 6.1 were relatively sensitive at MS/MS level.Moreover, peptide 3 for Beta strain, peptide 4 for Omicron BA1 and BA2 strains, peptide 7 for Omicron BA1 strain and peptide 9 for Alpha strain showed relatively low limit of detection in MS analyzing mode as compared to MS/MS mode, suggesting that their fragmentations were tentatively low and MS data would be helpful for identifying these SARS-CoV-2 strains.

Discussion
In vaccine production, QC is an important step to confirm identities of raw materials, relevant ingredients and final products.They are also screened for unwanted contaminants within the QC processes 25,26 .In case of SARS-CoV-2 vaccine production, the presence of other viral strains in the drug substances and/or end products would lead to failure of product tests and may cause a drop in efficacy and/or increase the risk of adverse events 27 .Therefore, establishing a suitable method for checking authenticity of the SARS-CoV-2 strain selected for vaccine production and detecting contaminants from other strains is essential for vaccine quality assessment.LC-MS technique is widely known for its sensitivity, robustness, accuracy and high throughput 28 .It has a benefit over an enzyme-linked immunosorbent assay (ELISA) because it can detect several peptides from one or multiple viral strains in a single run 29 .LC-MS method has been broadly used in clinical research to identify SARS-CoV-2 variants, but surprisingly its application in manufacturing process of biological products has less been found 30 .
This study demonstrated that LC-MS can be a useful tool for evaluating product quality in biopharmaceutical production.www.nature.com/scientificreports/Wuhan and early developed SAR-CoV-2 strains, for example Alpha, Beta, Gamma, Delta and Epsilon show little variation among their RBD amino acid sequences, demonstrating their close relationships.However, more recently emerging Omicron strains show 17 amino acids different from Wuhan wild type, indicating substantial evolution of Omicron lineages (Fig. 1).All distinct amino acids detected among 14 SARS-CoV-2 strains contribute to nine characteristic peptides upon tryptic digestion.Most of the characteristic peptides can be readily detected in LC-MS analysis but unmodified peptide 6 is hardly detected.The sequence between 150D and 192R positions (43 amino acids in length) has neither arginine (R) nor lysine (K), digestive sites of trypsin enzyme, causing a lengthy peptide after digestion.The large peptide could become low hydrophilicity, poor ionization, limited fragmentation and could comprise multiple charge states, resulting in low or no signal in LC-MS analysis 31,32 .In the RBD-Fc sequences of different SARS-CoV-2 strains, including Delta, Iota, Theta and Eta strains, T161 or E167 position is substituted with K, yielding digestible peptides, which are useful for differentiating these strains from the others and from each other (Fig. 3).Peptides 1, 2, 3, 4 and 7 are applicable for detecting Omicron BA1 and BA2.Peptide 3 is helpful for identifying Beta and Gamma strains.Peptides 8 and 9 are required for identifying Zeta and Alpha strains, respectively.Peptide 5 is unique for Lambda strain and applicable for distinguishing Kappa and Epsilon strains from the others.When all characteristic peptides were used together (Fig. 3) and the identification processes were followed step by step from the top to the bottom, identification of SAR-CoV-2 strains can be achieved in a single attempt, even from a mix of recombinant RBD-Fc proteins derived from different viral strains (Fig. 4).
However, Kappa and Epsilon strains cannot be differentiated from each other with this LC-MS method because they have only one amino-acid difference, which does not contribute to a characteristic peptide after tryptic digestion.To overcome this issue, GluC protease could be an alternative enzyme for using alone or together with trypsin enzyme to additionally cleave the proteins at C-terminus of glutamic acid (E).GluC enzyme could facilitate the differentiation of SARS-CoV-2 RBD-Fc proteins derived from Epsilon and Kappa strain because the sequence of Epsilon strain contains glutamic acid at position 167 (E167), which is cleavable by GluC enzyme.However, this amino acid is mutated to glutamine (Q167) for Kappa strain.After GluC digestion, the resulting peptides would be different between the samples derived from Epsilon and Kappa strains and therefore distinguishable by LC-MS analysis.Apart from GluC enzyme, other proteases, such as LysC, ArgC and AspN enzymes, could be additionally applied to increase completeness of protein digestion at lysine (K) and arginine (R) positions or additionally cleave the peptide at aspartic acid (D) position to increase number of resulting peptides.This could lead to better detection and differentiation of vaccine products derived from closely related SARS-CoV-2 strains.However, time and cost would be factors of concern as increased number of enzymes will double digestion time and chemical use.
Post-translational modifications of N-glycosylation could be another factor, affecting LC-MS results and analytical performance to confirm protein identity and detect contaminations.In this study, characteristic peptide 1 has one glycosylation site at N26 amino acid.Sometimes, MS/MS signal of the non-glycosylated form was not observed.Peptide 1 is important for differentiating RBD-Fc proteins derived from Omicron BA1 and BA2 from the others.To improve the MS/MS signal of peptide 1 and abate the effects of N-glycosylation, protease www.nature.com/scientificreports/enzymes, such as Endo-S and PNGase A, could be additionally used to release N-glycans from the core protein structure.Nonetheless, other characteristic peptides, including peptides 2, 3, 4 and 7, are appliable for distinguishing Omicron BA1 and BA2 strains from each other and from the others.Hence, Endo-S and PNGase A enzymes might not be necessary in this study, but they would benefit the other studies, discovering that only N-glycosylated peptide is a characteristic peptide.
In typical LC-MS analysis, the resulting peptides are matched to the reference sequence of individual proteins.For example, the result from Wuhan SARS-CoV-2 RBD-Fc sample is usually compared against the reference amino acid sequence of Wuhan strain only (Fig. 2).Identity of Wuhan strain might be confirmed but potential contaminants from other strains would not be identified.By having the peptide key alongside the reference of target protein, purity of intermediate substances and vaccine products can be more ascertained.Generally, the limit of LC-MS detection is in a range of 1-10 ng ml −1 solution or ng mg −1 protein 33,34 .Within this study, a protein spike-in assay was performed to examine the limits of peptide detection.Among nine characteristic peptides, the lowest points of peptide detection at MS/MS level were around 5 ng ml −1 , observed from peptide 5 and 8 (Table 1).These two peptides are unique for detecting RBD-Fc derived from Lambda, Zeta, Epsilon or Kappa strains.Peptide 3 for detecting RBD-Fc derived Gamma strain and peptide 6.1 for detecting RBD-Fc derived from Iota strain were detectable at concentration around 20 ng ml −1 , demonstrating relatively sensitive detection by this developed LC-MS method.Detection limits of other peptides were in a range of 50-200 ng ml −1 , which would be sufficient for detecting even small contaminations in our vaccine production.Contaminants from other strains likely occur at microgram level since the tested materials are typically prepared at 1 mg ml −1 .
According to the WHO guideline, cross-contamination of biological materials at any stage during biopharmaceutical manufacturing is a risk and must be assessed and controlled 19 .In our production system, biological materials can be derived from different SARS-CoV-2 strains, therefore appropriate logistic plans and activities, such as a clear separation of storage areas between different bacteria cell banks and avoiding infiltrating different strains at the same time, must be exercised to reduce the risk.These processes have already been implemented in the manufacturing plant of Baiya Phytopharm.An implementation of LC-MS analysis in the QC process will ensure the purity of starting materials and final vaccine products.Furthermore, molecular techniques, such as real-time PCR, could be further developed to check contaminations of different SARS-CoV-2 strains in bacterial clones and DNA plasmids during cloning and transformation steps.It could also be used to track down microbial contaminations and host cell DNA residues in the final vaccine products 35 .By using bioanalytical and molecular techniques in combination, QC analysis in biologic manufacture would be more forceful to ensure production integrity and product quality.

Conclusion
The LC-MS approach was developed to confirm the RBD sequence of Wuhan wild type strain in SARS-CoV-2 RBD-Fc vaccine production and to detect contaminations of other SARS-CoV-2 strains in the products.The method was incorporated into our QC procedures to ensure product quality.The generated peptide key, comprising nine characteristic peptides, facilitates SARS-CoV2 strain identification.All 14 SARS-CoV-2 strains, except Kappa and Epsilon, can be clearly distinguished in one step of detection.GluC enzyme could be further applied to facilitate the differentiation of RBD-Fc derived from Kappa and Epsilon strains.The limit of LC-MS detection likely covers the range of protein contaminations in drug substances and final vaccine products.This method would not only improve QC system in our vaccine production but can be developed for characterizing SARS-CoV-2 variants in other clinical or virological COVID-19-related research.

Plant materials
Original seeds of Nicotiana benthamiana plant used in this study were supported by Dr. Supaart Sirikantaramas, Faculty of Science, Chulalongkorn University.Plants were grown in a closed room under controlled conditions at Faculty of Pharmaceutical Sciences, Chulalongkorn University with permission from the Institutional Biosafety Committee of Chulalongkorn University (CU-IBC).Plant material collection and waste management complied with the safety guidelines regulated by the Center for Safety, Health and Environment of Chulalongkorn University (SHECU).

Construction of SARS-CoV-2 RBDs in conjugation with Fc region of IgG1
Nucleotide sequences of the RBD region of various SARS-CoV-2 strains, including Wuhan wild-type, Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda, Omicron BA1 and Omicron BA2 were obtained from the NCBI database and synthesized using an oligo synthesizer.The RBD sequences were fused with the sequences of human Fc region immunoglobulin G1 (IgG1) using 3 sets of GGGGS linker.The KDEL sequence was added at C-terminus of Fc region to establish high-mannose N-glycan patterns 36 .Recombinant DNA was ligated into a geminiviral vector, pBaiya, and transformed into Agrobacterium tumefaciens GV3101 using electroporation method 37 .A graphic illustration showing amino acid differences within the RBD region among all SAR-CoV-2 strains is depicted in Fig. 1.

Transient expression of RBD-Fc fusion protein in N. benthamiana
Infiltration steps were performed as described previously 38,39 .Briefly, A. tumefaciens, containing the pBaiya vector inserted with SARS-CoV-2 RBD-Fc IgG1 recombinant protein, was grown at 28 °C for two days and infiltrated into three-week-old hydroponically grown N. benthamiana host plants using a vacuum chamber.Infected plants were maintained at 28 °C with 16 h light and 8 h darkness for three days when wilting symptoms were observed.www.nature.com/scientificreports/

Protein purification
Plant leaves were manually harvested and extracted using a 1 × phosphate buffer saline (PBS) buffer.Leaf extracts were initially filtered using a filter cloth and then centrifuged at 26,000×g for 40 min at 4 °C.Supernatant was filtered again through 0.45 µm membrane filter using vacuum filtration and loaded onto protein A column.The column was washed with 1 × PBS and the protein was eluted using 0.1 M glycine buffer, pH 2.9.Protein solution was neutralized with 1 M Tris-HCl, pH 8.8 and then dialyzed against 1 × PBS (cell grade) for three cycles.The dialyzed protein was concentrated using Amicon Ultra-4 30 kDa molecular weight cutoff (MWCO) centrifugal device.Protein concentration was measured using Bradford assay and adjusted to a final concentration of 1 mg ml −1 .

Tryptic digestion
Approximately 20 µg of plant-produced SARS-CoV-2 RBD-Fc proteins of all strains was desalted and buffer exchanged to 100 µl of 50 mM ammonium bicarbonate (ABC) buffer using Pall NanoSep 30 kDa MWCO centrifugal device.Final protein concentration was approximately 200 µg ml −1 .Protein solution was reduced with 10 mM dithiothreitol (DTT) at 65 °C for 30 min and alkylated with 25 mM iodoacetamide (IAA) at room temperature for 20 min under darkness.Proteins were digested with 0.5 µg trypsin at 37 °C for 4 h.The reaction was stopped using 10% formic acid (FA) and centrifuged at 14,000 rpm for 10 min.Supernatant was transferred to a polypropylene vial and subjected to LC-MS/MS analysis.For peptides with charge + 1 and + 2, the CE was calculated using a formula of (3.1 × ((m/z)/100) + 1), while peptides with charge ≥ + 3, the CE was calculated using a formula of (3.6 × ((m/z)/100) − 4.8).

Data analysis
Raw data was uploaded to Agilent MassHunter BioConfirm software version 11.0.The data was processed using peptide mapping workflow.Total ion chromatogram was integrated and extracted to obtain MS spectrum using default settings.The resulting peptides were matched against the reference SARS-CoV-2 protein sequences with MS mass error at ± 10 ppm and MS/MS mass error at ± 30 ppm.False discovery rate (FDR) was set at 1%.Only peptides with MS/MS fragmentations were considered for protein identification.Trypsin was selected as a digestion method.Cysteine carbamidomethylation was a fixed modification.Phosphorylation and oxidation were selected as variable modifications.Matched peptides were used to confirm RBD sequence of Wuhan SARS-CoV-2 strain (Fig. 2) and to create the peptide key for detecting contaminations derived from RBD-Fc of other strains (Fig. 3).

Determination of detection limit of each characteristic peptide
Wuhan RBD-Fc proteins were mixed with the proteins derived from other 13 SARS-CoV-2 strains, including Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda, Omicron BA1 and Omicron BA2 in a final concentration of 200 µg ml −1 .Protein mixture was diluted to 10 different concentrations, including 100, 50, 20, 10, 5, 2, 1, 0.5, 0.2 and 0.1 µg ml −1 .The samples were tryptic digested and processed to LC-MS analysis according to the method described above.The lowest dilution that the peptide was detected in MS and MS/ MS modes is reported.In addition, peak areas of the peptide detected in MS mode were plotted against protein concentrations.The detection limit was calculated using the formula of (3.3 × standard deviation of the response) / slope of the calibration curve.

Figure 1 .Figure 2 .
Figure 1.Amino acid differences of RBD region among all 14 SARS-CoV-2 strains included in this study.(a) Protein construct of SARS-CoV-2 RBD-Fc IgG1 fusion with magnification of amino acid sequences along the RBD region, showing nine characteristic peptides obtained upon trypsin digestion.Amino acid changes are highlighted in yellow.(b) Details of amino acid changes at RBD region of 13 SARS-CoV-2 strains as compared to Wuhan strain."Spike no." refers to the corresponding position in a full-length sequence of Wuhan SARS-CoV-2 spike glycoprotein.

Figure 3 .
Figure 3. Peptide key with nine characteristic peptides for identification of SARS-CoV-2 strains.The identification process flows from the top to the bottom.Left side of the flowchart shows peptide common for Wuhan and other strains, whereas right side of the flowchart shows peptide unique for particular SARS-CoV-2 strains.

Figure 4 .
Figure 4. Examples showing applications of the peptide key to identify RBD-Fc proteins derived from different SARS-CoV-2 strains.In the top panel (case 1), Omicron BA2 was mainly identified, along with partial sequences of Wuhan strain.In the bottom panel (case 2), either Alpha and Kappa or Alpha and Epsilon were two SARS-CoV-2 strains mixed in this RBD-Fc sample.

Table 1 .
Detection limit of nine characteristic peptides.
Vol.:(0123456789) Scientific Reports | (2024) 14:9629 | https://doi.org/10.1038/s41598-024-59860-4 Peptide samples were analyzed using Agilent 1290 Infinity II LC system coupled with Agilent 6545XT Q-TOF mass spectrometer.LC separation was conducted on AdvanceBio Peptide Mapping column (120 Å, 2.1 × 150 mm, 2.7 µm) at 60 °C.Injection volume was 10 µl.Mobile phase A was 0.1% FA in water and mobile phase B was 0.1% FA in acetonitrile.LC gradient was set as follows; 0% B for 2 min, 0-20% B in 33 min, 20-30% B in 20 min, 30-50% B in 10 min, 50-90% B in 5 min, 90% B for 5 min, 90-0% B in 5 min and 0% B for 5 min, with constant flow rate of 0.4 ml min −1 .MS analysis was conducted in positive mode with a mass range of 100-1700 m/z.MS parameters were set as follows; gas temperature at 325 °C, nebulizer at 35 psi, dying gas at 13 L min −1 , sheath gas temperature at 350 °C, sheath gas flow at 12 L min −1 , capillary voltage at 4000 V, nozzle voltage at 500 V, fragmentor voltage at 175 V and skimmer voltage at 65 V. Acquisition time was 1 spectrum per s.Maximum 2 precursor ions per cycle were selected for MS/MS fragmentation.Collision energy (CE) was varied according to the charge state of the peptide.